45 research outputs found

    LESIM: A Novel Lexical Similarity Measure Technique for Multimedia Information Retrieval

    Get PDF
    Metadata-based similarity measurement is far from obsolete in our days, despite research’s focus on content and context. It allows for aggregating information from textual references, measuring similarity when content is not available, traditional keyword search in search engines, merging results in meta-search engines and many more research and industry interesting activities. Existing similarity measures do not take into consideration neither the unique nature of multimedia’s metadata nor the requirements of metadata-based information retrieval of multimedia. This work proposes a customised for the commonly available author-title multimedia metadata hybrid similarity measure that is shown through experimentation to be significantly more effective than baseline measures

    Discovering Influential Twitter Authors Via Clustering And Ranking On Apache Storm

    Get PDF
    Nowadays several millions of people are throughout the day active, while hundreds of new accounts are created daily on social media. Thousands of short-length posts or tweets are posted on Twitter, a popular micro-blogging platform by a vast variety of authors and thus creating a widely diverse social content. The emerged diversity not only does indicate a remarkable strength, but also reveals a certain kind of difficulty when attempting to find Twitter’s authoritative and influencing authors. This work introduces a two-step algorithmic approach for discovering these authors. A set of metrics and features are, firstly, extracted from the social network e.g. friends and followers and the content of the tweets written by the author are extracted. Then, Twitter’s most authoritative authors are discovered by employing two distinct approaches, one which relies on probabilistic while the other applies fuzzy clustering. In particular, the former, initially, employs the Gaussian Mixture Model to identify the most authoritative authors and then introduces a novel ranking technique which relies on computing the cumulative Gaussian distribution of the extracted metrics and features. On the other hand, the latter combines the Gaussian Mixture Model with fuzzy c-means and subsequently the derived authors are ranked via the Borda count technique. The results indicate that the second scheme was able to find more authoritative authors in the benchmark dataset. Both approaches were designed, implemented, and executed on a local cluster of the Apache Storm framework, a cloud-based platform which supports streaming data and real-time scenarios

    Documentation of clinically relevant genomic biomarker allele frequencies in the next-generation FINDbase worldwide database

    Get PDF
    FINDbase (http://www.findbase.org) is a comprehensive data resource recording the prevalence of clinically relevant genomic variants in various populations worldwide, such as pathogenic variants underlying genetic disorders as well as pharmacogenomic biomarkers that can guide drug treatment. Here, we report significant new developments and technological advancements in the database architecture, leading to a completely revamped database structure, querying interface, accompanied with substantial extensions of data content and curation. In particular, the FINDbase upgrade further improves the user experience by introducing responsive features that support a wide variety of mobile and stationary devices, while enhancing computational runtime due to the use of a modern Javascript framework such as ReactJS. Data collection is significantly enriched, with the data records being divided in a Public and Private version, the latter being accessed on the basis of data contribution, according to the microattribution approach, while the front end was redesigned to support the new functionalities and querying tools. The abovementioned updates further enhance the impact of FINDbase, improve the overall user experience, facilitate further data sharing by microattribution, and strengthen the role of FINDbase as a key resource for personalized medicine applications and personalized public health

    Fuzzy Random Walkers with Second Order Bounds: An Asymmetric Analysis

    No full text
    Edge-fuzzy graphs constitute an essential modeling paradigm across a broad spectrum of domains ranging from artificial intelligence to computational neuroscience and social network analysis. Under this model, fundamental graph properties such as edge length and graph diameter become stochastic and as such they are consequently expressed in probabilistic terms. Thus, algorithms for fuzzy graph analysis must rely on non-deterministic design principles. One such principle is Random Walker, which is based on a virtual entity and selects either edges or, like in this case, vertices of a fuzzy graph to visit. This allows the estimation of global graph properties through a long sequence of local decisions, making it a viable strategy candidate for graph processing software relying on native graph databases such as Neo4j. As a concrete example, Chebyshev Walktrap, a heuristic fuzzy community discovery algorithm relying on second order statistics and on the teleportation of the Random Walker, is proposed and its performance, expressed in terms of community coherence and number of vertex visits, is compared to the previously proposed algorithms of Markov Walktrap, Fuzzy Walktrap, and Fuzzy Newman–Girvan. In order to facilitate this comparison, a metric based on the asymmetric metrics of Tversky index and Kullback–Leibler divergence is used

    Exploring Clustering Techniques for Analyzing User Engagement Patterns in Twitter Data

    No full text
    Social media platforms have revolutionized information exchange and socialization in today’s world. Twitter, as one of the prominent platforms, enables users to connect with others and express their opinions. This study focuses on analyzing user engagement levels on Twitter using graph mining and clustering techniques. We measure user engagement based on various tweet attributes, including retweets, replies, and more. Specifically, we explore the strength of user connections in Twitter networks by examining the diversity of edges. Our approach incorporates graph mining models that assign different weights to evaluate the significance of each connection. Additionally, clustering techniques are employed to group users based on their engagement patterns and behaviors. Statistical analysis was conducted to assess the similarity between user profiles, as well as attributes, such as friendship, followings, and interactions within the Twitter social network. The findings highlight the discovery of closely linked user groups and the identification of distinct clusters based on engagement levels. This research emphasizes the importance of understanding both individual and group behaviors in comprehending user engagement dynamics on Twitter

    Customer Behaviour Analysis for Recommendation of Supermarket Ware

    No full text
    Part 10: Mining Humanistic Data Workshop (MHDW)International audienceIn this paper, we present a prediction model based on the behaviour of each customer using data mining techniques. The proposed model utilizes a supermarket database and an additional database from Amazon Company, both containing information about customers’ purchases. Subsequently, our model analyzes these data in order to classify customers as well as products; whereas being trained and validated with real data. This model is targeted towards classifying customers according to their consuming behaviour and consequently propose new products more likely to be purchased by them. The corresponding prediction model is intended to be utilized as a tool for marketers so as to provide an analytically targeted and specified consumer behavior

    Forecasting Air Flight Delays and Enabling Smart Airport Services in Apache Spark

    No full text
    Part 6: 10th Mining Humanistic Data Workshop (MHDW 2021)International audienceIn light of the rapidly growing passenger and flight volumes, airports seek for sustainable solutions to improve passengers’ experience and comfort, while maximizing their profits. A major technological solution towards improving service quality and management processes in airports comprises Internet of Things (IoT) systems that realize the concept of smart airports and offer interconnection potential with other public infrastructures and utilities of smart cities. In order to deliver smart airport services, real-time flight delay data and forecasts are a critical source of information. This paper introduces an essential methodology using machine learning techniques on Apache Spark, a cloud computing framework, with Apache MLlib, a machine learning library to develop and implement prediction models for air flight delays that could be integrated with information systems in order to provide up-to-date analytics. The experimental results have been implemented with various algorithms in terms of classification as well as regression, thus manifesting the potential of the proposed framework
    corecore